Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus

نویسندگان

  • Yasunori Ohishi
  • Katunobu Itou
  • Kazuya Takeda
  • Atsushi Fujii
چکیده

Conclusion Discrimination for the hierarchical relation of a word pair using an encyclopedic corpus called the Cyclone corpus In order not to miss an indirect relationship, a semantic expansion technique for descriptions is used The proposed method is able to detect 66.1% of relations Future work Discrimination between hierarchical and synonymous relation PREVIOUS WORK To extract hyponyms, synonyms, and hypernyms, Sentences that have specific syntactic patterns ( “a part of” “is a” “such as” ) (Marti, 1992; Tsurumaru, 1991) Descriptions in a dictionary (Suzuki, 2003) Specific document structure (Shinzato, 2004) are used

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Thesaurus Construction for a Morphologically Rich Language

Corpus-based thesaurus construction for Morphologically Rich Languages (MRL) is a complex task, due to the morphological variability of MRL. In this paper we explore alternative term representations, complemented by clustering of morphological variants. We introduce a generic algorithmic scheme for thesaurus construction in MRL, and demonstrate the empirical benefit of our methodology for a Heb...

متن کامل

Working on a botanic corpus

Extracting information from an encyclopedic corpus of botanic may be done by hand but it is a long and tedious work. More and more, it becomes interesting and possible to speed-up the process by automatizing it but still keeping an human expert for validation. Among the different kind of information that may be extracted from a botanic corpus, we can cite terminology, conceptual information to ...

متن کامل

A Method of Automatic Hypertext Construction from an Encyclopedic Dictionary of a Specific Field

1 Introduction Nowadays, very large volume of texts are created and stored in computer, and as a result the retrieval of texts which fits to a user's demand has become a difficult problem. Hypertext is a typical system to answer this problem , whose primary objective is to establish flexible as-sociative links between relevant text parts and to allow users to select and trace links to see relev...

متن کامل

Knowledge Acquisition: Classification of Terms in a Thesaurus from a Corpus

! Faced with growing volume and accessibility of electronic textual information, information retrieval, and, in general, automatic documentation require updated terminological resources that are ever more voluminous. A current problem is the automated construction of these resources (e.g., terminologies, thesauri, glossaries, etd~ ~) from a corpus. Various linguistic and statistical methods to ...

متن کامل

Automatic Thai Ontology Construction and Maintenance System

Ontology is an essential resource to enhance the performance of Information Processing system such as information integration, document classification in taxonomies, including information retrieval and data cleaning in database system. This paper proposes three methodologies for Automatic Thai Ontology Construction and Maintenance from technical corpus, dictionary and thesaurus. For corpus base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006